AITopics | gradient perturbation

Collaborating Authors

gradient perturbation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization

Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu

Neural Information Processing SystemsFeb-19-2026, 20:30:07 GMT

Neural Information Processing Systems http://nips.cc/

differential privacy, noise, privacy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science > Data Mining (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Gradient perturbation: For a parametric function fθ(x) parameterized by θ and loss function L(fθ(x),y), usual mini-batched first-order optimizers update θ using gradients gt = 1 N

Neural Information Processing SystemsFeb-9-2026, 03:32:50 GMT

In addition to the notations defined in Sec. Note that we use a slightly different notation compared to the main text, because it is more convenient to deal with empirical distributions rather than samples when relating to the dual formulation later on. Thus,oncewefind the optimal f and g, we can obtain P λ through this primal-dual relationship. Readerscan refer to [59] for further details. Under gradient perturbation, the gradient gt is first clipped in L2 norm byconstant,andthennoisesampledfromN(0,σ2I)isadded.

artificial intelligence, gradient perturbation, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization

Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu

Neural Information Processing SystemsNov-20-2025, 17:22:30 GMT

In our output perturbation method, the parties combine local models within a secure computation and then add the required differential privacy noise before revealing the model.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science > Data Mining (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Combining Variational Modeling with Partial Gradient Perturbation to Prevent Deep Gradient Leakage

Scheliga, Daniel, Mäder, Patrick, Seeland, Marco

arXiv.org Artificial IntelligenceAug-9-2022

Exploiting gradient leakage to reconstruct supposedly private training data, gradient inversion attacks are an ubiquitous threat in collaborative learning of neural networks. To prevent gradient leakage without suffering from severe loss in model performance, recent work proposed a PRivacy EnhanCing mODulE (PRECODE) based on variational modeling as extension for arbitrary model architectures. In this work, we investigate the effect of PRECODE on gradient inversion attacks to reveal its underlying working principle. We show that variational modeling induces stochasticity on PRECODE's and its subsequent layers' gradients that prevents gradient attacks from convergence. By purposefully omitting those stochastic gradients during attack optimization, we formulate an attack that can disable PRECODE's privacy preserving effects. To ensure privacy preservation against such targeted attacks, we propose PRECODE with Partial Perturbation (PPP), as strategic combination of variational modeling and partial gradient perturbation. We conduct an extensive empirical study on four seminal model architectures and two image classification datasets. We find all architectures to be prone to gradient leakage, which can be prevented by PPP. In result, we show that our approach requires less gradient perturbation to effectively preserve privacy without harming model performance.

gradient, model performance, precode, (15 more...)

arXiv.org Artificial Intelligence

2208.04767

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > Lower Saxony > Wolfsburg (0.04)
Europe > Austria > Upper Austria > Linz (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Easy Batch Normalization

Asadulaev, Arip, Panfilov, Alexander, Filchenkov, Andrey

arXiv.org Artificial IntelligenceJul-18-2022

It was shown that adversarial examples improve object recognition. But what about their opposite side, easy examples? Easy examples are samples that the machine learning model classifies correctly with high confidence. In our paper, we are making the first step toward exploring the potential benefits of using easy examples in the training procedure of neural networks. We propose to use an auxiliary batch normalization for easy examples for the standard and robust accuracy improvement.

artificial intelligence, easy example, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.0894

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.05)
Asia > Russia (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(9 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Differentially Private SGD with Non-Smooth Loss

Wang, Puyu, Lei, Yunwen, Ying, Yiming, Zhang, Hai

arXiv.org Machine LearningJan-21-2021

In this paper, we are concerned with differentially private SGD algorithms in the setting of stochastic convex optimization (SCO). Most of existing work requires the loss to be Lipschitz continuous and strongly smooth, and the model parameter to be uniformly bounded. However, these assumptions are restrictive as many popular losses violate these conditions including the hinge loss for SVM, the absolute loss in robust regression, and even the least square loss in an unbounded domain. We significantly relax these restrictive assumptions and establish privacy and generalization (utility) guarantees for private SGD algorithms using output and gradient perturbations associated with non-smooth convex losses. Specifically, the loss function is relaxed to have $\alpha$-H\"{o}lder continuous gradient (referred to as $\alpha$-H\"{o}lder smoothness) which instantiates the Lipschitz continuity ($\alpha=0$) and strong smoothness ($\alpha=1$). We prove that noisy SGD with $\alpha$-H\"older smooth losses using gradient perturbation can guarantee $(\epsilon,\delta)$-differential privacy (DP) and attain optimal excess population risk $O\Big(\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}+\frac{1}{\sqrt{n}}\Big)$, up to logarithmic terms, with gradient complexity (i.e. the total number of iterations) $T =O( n^{2-\alpha\over 1+\alpha}+ n).$ This shows an important trade-off between $\alpha$-H\"older smoothness of the loss and the computational complexity $T$ for private SGD with statistically optimal performance. In particular, our results indicate that $\alpha$-H\"older smoothness with $\alpha\ge {1/2}$ is sufficient to guarantee $(\epsilon,\delta)$-DP of noisy SGD algorithms while achieving optimal excess risk with linear gradient complexity $T = O(n).$

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2101.08925

Country:

North America > United States > New York > Albany County > Albany (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Fu, Hao, Zhou, Shaojun, Yang, Qihong, Tang, Junjie, Liu, Guiquan, Liu, Kaikui, Li, Xiaolong

arXiv.org Artificial IntelligenceDec-14-2020

The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts of memory and the consumption of inference time, which makes it difficult to deploy them on edge devices. In this work, we propose a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Furthermore, we introduce a gradient perturbation-based training architecture in the training phase to increase the robustness of LRC-BERT, which is the first attempt in knowledge distillation. Additionally, in order to better capture the distribution characteristics of the intermediate layer, we design a two-stage training method for the total distillation loss. Finally, by verifying 8 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.

distillation, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2012.07335

Country:

Europe > France (0.04)
Asia > China (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Differentially Private ADMM Algorithms for Machine Learning

Xu, Tao, Shang, Fanhua, Liu, Yuanyuan, Liu, Hongying, Shen, Longjie, Gong, Maoguo

arXiv.org Artificial IntelligenceOct-30-2020

In this paper, we study efficient differentially private alternating direction methods of multipliers (ADMM) via gradient perturbation for many machine learning problems. For smooth convex loss functions with (non)-smooth regularization, we propose the first differentially private ADMM (DP-ADMM) algorithm with performance guarantee of $(\epsilon,\delta)$-differential privacy ($(\epsilon,\delta)$-DP). From the viewpoint of theoretical analysis, we use the Gaussian mechanism and the conversion relationship between R\'enyi Differential Privacy (RDP) and DP to perform a comprehensive privacy analysis for our algorithm. Then we establish a new criterion to prove the convergence of the proposed algorithms including DP-ADMM. We also give the utility analysis of our DP-ADMM. Moreover, we propose an accelerated DP-ADMM (DP-AccADMM) with the Nesterov's acceleration technique. Finally, we conduct numerical experiments on many real-world datasets to show the privacy-utility tradeoff of the two proposed algorithms, and all the comparative analysis shows that DP-AccADMM converges faster and has a better utility than DP-ADMM, when the privacy budget $\epsilon$ is larger than a threshold.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2011.00164

Country:

North America > United States (0.14)
Asia > China > Shaanxi Province (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

Yu, Da, Zhang, Huishuai, Chen, Wei, Liu, Tie-Yan, Yin, Jian

arXiv.org Machine LearningNov-26-2019

Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in non-private case. In this paper, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of both DP-GD and DP-SGD is determined by an \emph{expected curvature} rather than the minimum curvature. The \emph{expected curvature} represents the average curvature over the optimization path, which is usually much larger than the minimum curvature and hence can help us achieve a significantly improved utility guarantee. By using the \emph{expected curvature}, our theory justifies the advantage of gradient perturbation over other perturbation methods and closes the gap between theory and practice. Extensive experiments on real world datasets corroborate our theoretical findings.

artificial intelligence, curvature, machine learning, (19 more...)

arXiv.org Machine Learning

1911.11363

Country:

North America > United States (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Weighted Distributed Differential Privacy ERM: Convex and Non-convex

Kang, Yilin, Liu, Yong, Wang, Weiping

arXiv.org Machine LearningOct-22-2019

Yilin Kang, Y ong Liu, Weiping Wang Abstract Distributed machine learning is an approach allowing different parties to learn a model over all data sets without disclosing their own data. In this paper, we propose a weighted distributed differential privacy (WD-DP) empirical risk minimization (ERM) method to train a model in distributed setting, considering different weights of different clients. We guarantee differential privacy by gradient perturbation, adding Gaussian noise, and advance the state-of-the-art on gradient perturbation method in distributed setting. By detailed theoretical analysis, we show that in distributed setting, the noise bound and the excess empirical risk bound can be improved by considering different weights held by multiple parties. Moreover, considering that the constraint of convex loss function in ERM is not easy to achieve in some situations, we generalize our method to non-convex loss functions which satisfy Polyak-Lojasiewicz condition. Experiments on real data sets show that our method is more reliable and we improve the performance of distributed differential privacy ERM, especially in the case that data scale on different clients is uneven. Introduction In recent years, machine learning has been widely used in many fields such as data mining and pattern recognition (He et al. 2015; Xu, Ni, and Y ang 2018; Wang et al. 2018; Zhang et al. 2019). Because of the need of data for training machine learning algorithms, tremendous data is collected by individuals and companies.

artificial intelligence, excess empirical risk, machine learning, (16 more...)

arXiv.org Machine Learning

1910.10308

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.83)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback